# Multi-image Reasoning
Minicpm V 2 6 Rk3588 1.1.4
MiniCPM-V 2.6 is a GPT-4V-level multimodal large language model supporting single-image, multi-image, and video understanding, optimized for RK3588 NPU
Image-to-Text
Transformers Other

M
c01zaut
31
3
Minicpm V 2 6
MiniCPM-V 2.6 is the latest and most powerful multimodal large model in the MiniCPM-V series, supporting single-image, multi-image, and video understanding with leading performance and extreme efficiency.
Image-to-Text
Transformers Other

M
jchevallard
118
1
MMICL Instructblip T5 Xxl
MIT
MMICL is a multimodal vision-language model combining blip2/instructblip, capable of analyzing and understanding multiple images while following instructions.
Image-to-Text
Transformers English

M
BleachNick
156
11
Featured Recommended AI Models